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Abstract 

We illustrate uniform quadratic majorization, sharp quadratic majorization, and 
sublevel quadratic majorization using the example of a univariate cubic. 
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Note: This is a working paper which will be expanded/updated frequently. All suggestions 
for improvement are welcome. The directory deleeuwpdx.net/pubfolders/cubic has a pdf 
version, the complete Rmd hie with all code chunks, the bib hie, and the R source code. 

1 Introduction 

Suppose X is the closed interval [L, U\, and / : X — > M. A function g \ X (&X —^ 1 is a 
majorization scheme for / on X if 

• g(x, x ) = f(x) for all i£l, 

• g(x, y ) > f(x) for all x, y G X. 

In other words for each y the global minimum of g(x, y ) — f(x) over x G X is zero, and it is 
attained at y. If the functions / and g are differentiable and the minimum is attained at an 
interior point of the interval, we have T>ig(x,y) = T>f(x). If the functions are in addition 
twice differentiable we have T>iig(x,y) > V 2 f(x). 

The majorization conditions are not symmetric in x and y, and consequently it sometimes is 
more clear to write g y (x) for g(x,y), so that g y : X —> M. We say that g y majorizes / on X at 
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y, or with support point y. 


A majorization algorithm is of the form 

a;( fe+1 ) G argming(a;, x^) 

xEX 


It then follows that 


f(x {k+1) ) < g(x {k+1 \x {k) ) < g(x {k \x^ k) ) = f(x {k) ), (1) 

Thus a majorization step decreases the value of the objective function. The chain (1) is 
called the sandwich inequality. In (1) the inequality /(ad fc+1 )) < g(x^ k+1 \x^) follows from 
majorization, the inequality g(x^ k+1 \ x^) < g{x^ k \x^) follows from minimization. This 
explains why majorization algorithms are also called MM algorithms (Lange (2016)). Using 
the MM label has the advantage that it can also be used for the dual family of minorization- 
maximization algorithms. 

In this note we are interested in quadratic majorization , i.e. in majorization functions of the 
form 


9 {x,y) = f(y) + f'(y)(x-y) + (x-y) 2 , 


( 2 ) 


and specifically on quadratic majorizers of a cubic on an closed interval of the real line. If 
K < 0 in (2) the majorizaton function is concave and attains its minimum at one of endpoints 
of X. If K > 0 then define the algorithmic map A(x) = x — f'(x)/K and 


(L if A(x^)<L, 

x ( k +i) = i ^( x (fc)) if l < A(x^) < U, 
(U if A(x^) > U. 


Assuming that the sequence x^ converges to a fixed point x^ of A, i.e a point with 
T>f(x oo) = 0, the rate of convergence is 


p(tx>) 


^ 2 f(x oo) 

K 


2 Majorizing a Cubic 

Suppose / is a non-trivial cubic, with non-zero leading coefficient. The function and its 
derivatives are 


f(x) — d + cx + bx 2 + ax 3 , 
f\x) = c + 2 bx + 3 ax 2 , 
f"{x) = 2 b + 6 ax, 
f"\x) = 6a. 
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2.1 Uniform Quadratic Majorization 

Define 


K 0 = max . f"{x) = ma x(f"(A), f"(B)). 

L<x<U 


Thus 


K 0 = 


6 aU + 2b 
6 aL + 2b 


if a > 0, 
if a < 0. 


Our majorization function is the quadratic 


9(x,y) ■= f{y) + f{y)(x-y) + ^K 0 (x-y) 2 . 


The corresponding majorization algorithm is 


^(fc+i) _ a rgminc/(:r, x^) 

L<x<U 


Note the majorizing function can be a concave quadratic, in which case its minimum is 
always at one of the endpoints of the interval. Assuming we eventualy converge to a value 
L < Xoo < U the convergence rate (or asymptotic error constant) is 


P(X oo ) 


till \ { 6a(t/-Zoo) 

■I hyoO _ I QaU +26 

I( n | 6a(L—X qq) 

U f 6aL+2b 


if a > 0, 
if a < 0. 


Note that this does not depend on the lower limit L of the interval if a > 0. Convergence is 
faster if the upper limit U happens to be close to x^, and in fact it can be close to zero. If 
U —y oo the rate of convergence goes to one. 

As a first, example, consider f(x) = |(1 — 2x + a; 3 ). This cubic has roots at -1.6180339887, 
0.6180339887, 1. There is a local maximum at —1\/6 and a local minimum at ^V6. The 
function (red), its first derivative (blue), and its second derivative (green) are in figure 1. 
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Figure 1: Example Cubic 

We will look for a local minimum in the interval [—2,2], stating with initial value 1. Note that 
f"(x) = x and thus K 0 = B = 2. At Xoo = |a/ 3 we have p(x 00 ) = 1 — |\/3, he. approximately 
0.7113248654. We report the results of the final iteration. 

## Iteration: 35 xinit: 1.00000000 xfinal: 0.57735207 rate: 0.71132334 

If we look for a local minimum in the interval [0,1] instead of [—2,2], we get much faster 
convergence, because the upper bound is now much closer to the solution. 

## Iteration: 14 xinit: 0.50000000 xfinal: 0.57734974 rate: 0.42265183 

If we start at a value to the left of —1\/3 and look for a minimum in [—2, 2] then the algorithm 
converges to the boundary at —2. 
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## Iteration: 3 xinit: -1.50000000 xfinal: -2.00000000 rate: 0.00000000 

Note that alteratively we could have used 

K+ := max \f"(x)\ 

L<x<U 

in our majorization frictions. Since f" is linear we see that | f”\ is convex, and thus Kq = 
max(| f"(L)\, \f"{U)\. Using Kq gives a majorization which is generally less precise, but uses 
majorization functions that are always convex quadratics. Also note that if there is a strict 
local minimum in X then K 0 > o, although we can still have K 0 < Kq. Think of [—1, .75], 
for which Kq = .75 and Kq = 1. 

2.2 Sharp Quadratic Majorization 

Consider the general representation of a cubic around the point y 

f{x) = f(y) + f(y)(x -y) + ^f"(y)(x - yf + - yf. 

For a quadratic function of the form (2) we have 

g(x,y)-f(x) = \(xx. y) 2 {K - f"(y) - !/"'(* -</)}• (3) 

Thus the quadratic function is a majorizer if 

K > max f"(y) + \f'{x - y) (4) 

xe.z o 

which works out to K > f"(y) + | max ( f'"(U — y), f'"(L — y))- We get the sharp quadratic 
majorization (De Leeuw and Lange (2009)) by choosing K equal to its lower bound. The 
corresponding rate in the cubic case, with positive leading coefficient, is 

, X \f"\U-X oo) 

P[XOO) f"( x oo) + — £00) ' 

If we reanalyze our example with the sharp bound we find faster convergence in the first two 
computing runs. 

## Iteration: 17 xinit: 1.00000000 xfinal: 0.57735073 rate: 0.45096147 

## Iteration: 8 xinit: 0.50000000 xfinal: 0.57735010 rate: 0.19615217 

We also find convergence to the local minimum, and not to the nearby boundary, in the third 
run. 

## Iteration: 18 xinit: -1.50000000 xfinal: 0.57735105 rate: 0.45096119 


5 



2.3 Sublevel Quadratic Majorization 

We know that quadratic majorization of a cubic on the whole line is impossible. This is one 
of the reasons for looking at quadratic majorization on a closed interval, where a continuous 
second derivative is always bounded. In this section we relax the majorization requirements, 
using a closed interval that depends on the current solution and becomes smaller if we get 
closer to a minimum. The reuslting majorization method can be thought of as a safeguarded 
version of Newton’s method. 


A function g \ X ®X — » M is a sublevel majorization scheme for / on X if 

• g(x, x ) = f(x) for all x E X, 

• g(x,y ) > f(x) for all x, y E X for which g(x,y) < g(y,y). 

The second part of the definition says that g majorizes / on the sublevel set {x E X \ g(x,y) < 
g(y,y)}- If we minimize the sublevel majorization the sandwich inequality (1) is still valid, 
so we still have monotone convergence of function values. 


We quickly specialize this to quadratic majorization functions, suppose X is the whole real 
fine, and also require K >0. If g is given by (2) then the sublevel set is the interval between 
y and y — 2 f'(y)/K. Note that either of the two bounds can be the smaller one. Thus we 
want inequality 

K > f"(y) + \f"(x - y) 

on the sublevel set, or equivalently at both endpoints. Thus K > max(/"(j/), 0) and 


This means we must have 


K > f"(y ) 


2 rm 

3 K 


K 2 - Kf"(y ) + | > 0. (5) 

Define K(y) as the smallest K > max(/"(t/), 0) satisfying (5), and we have sharp sublevel 
quadratic majorization (De Leeuw (2006)). 

If the quadratic equation corresponding to (5) has no real roots or a single real root, then the 
inequality (5) is satisfied for all K, and thus K(y) = max(/"(y), 0). If the equation has two 
real roots, they are written as p(y) < q(y). We have p(y) + q{y) = f"{y)- Thus if p(y) and 
q(y) are non-negative, then 0 < p(y) < q(y) < f"(y), and consequently K(y) = f"(y) > 0. 
If p{y) < 0 and q(y) > 0 then q(y) = f"{y) — p(y) > f"(y ) and thus K(y) = q(y). If both 
p(y) < 0 and q < 0 then f"(y) < p(y) < q(y) < 0, and thus K(y) = 0 and the sharp sublevel 
quadratic is linear. 

Figure 2 shows for y — \ arid various values of K what sublevel majorization looks like. The 
function is in red, the quadratic sublevel majorization in blue. Note the different lengths 
of the sublevel intervals. We see that K = | is too small. For y — \ the quadratic is 
K 2 = K — | K — which has roots -0.0504626063, 0.5504626063 and thus the sharp sublevel 
quadratic has K equal to 0.5504626063. 
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Figure 2: Sublevel Majorization 

If we are close to a strict local minimum x we have f'(x) ~ 0 and f"(x) > 0. Thus 
the quadratic will have one root approximately zero and one root approximately equal to 
f"(x), and the iteration is basically a Newton iteration. Thus, at least for cubics, sublevel 
majorization has quadratic convergence. We illustrate this by analyzing our small example 
with sublevel majorization, starting from x = 1 and x = \. 


## 

Iteration: 

1 

xold: 

1.00000000 

xnew: 

0.66666667 

cnew: 

0.33333333 

rate 

## 

Iteration: 

2 

xold: 

0.66666667 

xnew: 

0.58333333 

cnew: 

0.08333333 

rate 

## 

Iteration: 

3 

xold: 

0.58333333 

xnew: 

0.57738095 

cnew: 

0.00595238 

rate 

## 

Iteration: 

4 

xold: 

0.57738095 

xnew: 

0.57735027 

cnew: 

0.00003068 

rate 

## 

Iteration: 

5 

xold: 

0.57735027 

xnew: 

0.57735027 

cnew: 

0.00000000 

rate 

## 

Iteration: 

1 

xold: 

0.50000000 

xnew: 

0.57569391 

cnew: 

0.07569391 

rate 

## 

Iteration: 

2 

xold: 

0.57569391 

xnew: 

0.57734948 

cnew: 

0.00165557 

rate 

## 

Iteration: 

3 

xold: 

0.57734948 

xnew: 

0.57735027 

cnew: 

0.00000079 

rate 


If X = [L, U} then our analysis must be modified slightly. We want inequality (4) on the 
intersection of the sublevel interval and [L,U]. If the sublevel interval is in [L, U] our previous 
analysis applies. Because we always have y G [L, U] the other possible intervals are [y, U ] if 
y <U < y - 2 f(y)/K and [L, y] if y - 2 f{y)/K <L<y. 
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3 Appendix: Code 

3.1 auxilary.R 


mprint <- function (x, d = 2, w = 5) { 
print (noquote (formate ( 
x, di = d, wi = w, fo = "f" 

))) 


cobwebPlotter <- 
function (xold, 
func, 
lowx = 0, 
hghx = 1, 
lowy = lowx, 
hghy = hghx, 
eps = le-10, 
itmax = 25, 

...) { 

x <- seq (lowx, hghx, length = 100) 
y <- sapply (x, function (x) 
func (x, 
plot ( 
x, 

y> 

xlim = c(lowx , hghx), 
ylim = c(lowy, hghy), 
type = "1" , 
col = "RED", 
lwd = 2 

) 

abline (0, 1, col = "BLUE") 
base <- 0 
itel <- 1 
repeat { 

xnew <- func (xold, ...) 
if (itel > 1) { 

lines (matrix (c (xold, xold, base, xnew), 2, 2)) 

} 

lines (matrix(c(xold, xnew, xnew, xnew), 2, 2)) 
if ((abs (xnew - xold) < eps) I| (itel == itmax)) { 
break () 

} 


base <- xnew 
xold <- xnew 
itel <- itel + 1 

> 

} 

minQuadratic <- function (a, lw, up) { 
f <- polynomial (a) 
fup <- predict (f, up) 
flw <- predict (f, lw) 
if (a[3] <= 0) { 

if (fup <= flw) return (list (x = up, f = fup)) 
if (fup >= flw) return (list (x = lw, f = flw)) 

} 

xmn <- - a[2] / (2 * a[3]) 
fmn <- predict (f, xmn) 

if (xmn >= up) return (list (x = up, f = fup)) 
if (xmn <= lw) return (list (x = lw, f = flw)) 
return (list (x = xmn, f = fmn)) 


3.2 iterate.R 

mylterator <- 

function (xinit, 

f, 

eps = le-6, 
itmax = 100, 
verbose = FALSE, 
final = TRUE, 

...) { 

xold <- xinit 
cold <- Inf 
itel <- 1 
repeat { 

xnew <- f (xold, ...) 
cnew <- abs (xnew - xold) 
rate <- cnew / cold 
if (verbose) 
cat ( 

"Iteration: ", 

formate (itel, width = 3, format = "d"), 
"xold: ", 
formate ( 
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xold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"xnew: ", 
formate ( 
xnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"cnew: ", 
formate ( 
cnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"rate: ", 
formate ( 
rate, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if ((cnew < eps) I I (itel == itmax)) 
break 

xold <- xnew 
cold <- cnew 
itel <- itel + 1 

> 

if (final) 
cat ( 

"Iteration: ", 

formate (itel, width = 3, format = "d") , 
"xinit: ", 
formate ( 
xinit, 
digits = 8, 
width = 6, 

format = "f" 
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), 

"xfinal: ", 
formate ( 
xnew, 

digits = 8, 
width = 6, 

format = "f" 

), 

"rate: ", 

formate ( 
rate, 

digits = 8, 
width = 6, 

format = "f" 

), 

"\n" 

) 

return (list ( 
itel = itel, 
xinit = xinit, 
xfinal = xnew, 
change = cnew, 
rate = rate 

)) 

} 

cubicUQ <- function (x, a, up, lw, sharp = FALSE) { 
f <- polynomial (a) 
g <- deriv (f) 
h <- deriv (g) 
i <- deriv (h) 
if ( ! sharp) { 
if (a [4] > 0) 

k <- predict (h, up) 
if (a [4] < 0) 

k <- predict (h, lw) 

> 

if (sharp) { 
if (a[4] > 0) 

k <- predict (h, x) + predict (i, x) * (up - x) / 3 
if (a[4] < 0) 

k <- predict (h, x) + predict (i, x) * (lw - x) / 3 

} 

xmin <- x - predict (g, x) / k 
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if ((xmin <= up) && (xmin >= lw)) 
return (xmin) 
fup <- predict (f, up) 
flw <- predict (f, lw) 
return (ifelse (fup < flw, up, lw)) 


3.3 sublevel.R 

tester <- function (y, k, func, grad) { 

qmaj <- function (x) func (y) + grad (y) * (x - y) + .5 * k * (x - y) 2 
ybnd <- y - 2 * grad (y) / k 
up <- max (y, ybnd) 
lw <- min (y, ybnd) 
x <- seq (lw, up, length = 100) 

s <- paste ("K = ", formatC(k, digits = 1, format= "f")) 

plot (x, func (x), col = "RED", lwd = 2, type = "1", ylab = "f,g", main = s) 
lines (x, qmaj (x), col = "BLUE", lwd = 2) 
abline (v = up) 
abline (v = lw) 


f <- function (x) log (1 + exp (x)) 
g <- function (x) exp (x) / (1 + exp (x)) 

a <- function (x) (x ~ 3 - x) / 6 
b <- function (x) (3*x~2-l)/6 

cubicSublevel <- function (y, a, sharp = FALSE) { 
f <- polynomial (a) 
g <- deriv (f) 
h <- deriv (g) 
i <- deriv (h) 
dfy <- predict (g, y) 
dgy <- predict (h, y) 
dhy <- predict (i, y) 

disk <- dgy ~2-4*(2/3)* dhy * dfy 
if (disk <= 0) k <- max (0, dgy) 
else { 

r <- sort (solve (polynomial (c((2 / 3) * dhy * dfy, -dgy, 1)))) 

if ( (r [1] >= 0) && (r[2] >= 0)) k <- dgy 

if ( (r [1] <= 0) && (r[2] >= 0)) k <- r[2] 

if ((r[1] <= 0) && (r [2] <= 0)) stop ("unbounded") 
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} 

return (y - dfy / k) 

} 
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